Skip to content

Conversation

so-schen
Copy link

@so-schen so-schen commented May 22, 2025

Requirements

  1. We need devops team to package this migrate-config python script to rpc_node and supra version 9 docker image for mainnet, so that operator do not need to install locally @darpan-supraoracles

  2. We need to have snapshot bucket created for mainnet rpc and validator

        if is_validator; then
            bucket_name="mainnet-validator-snapshot"
        elif is_rpc; then
            bucket_name="mainnet-snapshot"

Note that aws access key may need to be updated in script

  1. This script support --assume-yes option to apply default migration values without prompting, it is most common use case for regular nodes, so that it is possible to batch processing config migration for our foundation nodes.

IMPORTANT note for DEVOPS
But for specially denoted nodes (i.e. archives, snapshot uploaders), use interactive prompt:
Do not use --assume-yes for these nodes, because the enable_snapshots and enable_pruning should retain original value instead of using the default.
Run the script in interactive prompt and confirm each update, and also check final output of migration .

  1. Operator should run interactively and follow the guide from the prompts. Operator should not run db migration, instead they should use snapshot sync to restart a node using v9 image

v7-v9 config migration docs

Install the config migration script

from root dir of repo, you can install it like below
pip install node_management/config_migration

Usage

$ migrate-config --help
Usage: migrate-config [OPTIONS] COMMAND [ARGS]...

  Migration CLI for Supra configs.

Options:
  --help  Show this message and exit.

Commands:
  rpc  Migrate RPC config.
  smr  Migrate SMR config.

$ migrate-config rpc  --help
Usage: migrate-config rpc [OPTIONS]

  Migrate RPC config.

Options:
  -p, --migrate-path [v7-v9]  Migration path (choices: v7-v9)  [required]
  -f, --from-file PATH        Source config file  [required]
  -t, --to-file PATH          Output config file  [required]
  -y, --assume-yes            Assume yes for all prompts (default: False)
  --help                      Show this message and exit.
  
$ migrate-config smr  --help
Usage: migrate-config smr [OPTIONS]

  Migrate SMR config.

Options:
  -p, --migrate-path [v7-v9]  Migration path (choices: v7-v9)  [required]
  -f, --from-file PATH        Source config file  [required]
  -t, --to-file PATH          Output config file  [required]
  -y, --assume-yes            Assume yes for all prompts (default: False)
  --help                      Show this message and exit.

example output

Running migration function: migrate_v7_to_v9

Scanning node root configuration ...
`connection_refresh_timeout_sec = 2` is not recommended for new version.
Do you want to apply the recommended config: `connection_refresh_timeout_sec = 1`? (assuming yes)
✓ Apply recommended config: `connection_refresh_timeout_sec = 1`

Scanning ledger configuration ...
✓ `enable_pruning` not found in original config, using new version's default value: True
✓ `enable_snapshots` not found in original config, using new version's default value: False

Scanning chain store configuration ...
✓ `enable_snapshots` not found in original config, using new version's default value: False

Scanning prune configuration ...

Scanning mempool configuration ...
`max_batch_delay_ms = 1500` is not recommended for new version.
Do you want to apply the recommended config: `max_batch_delay_ms = 500`? (assuming yes)
✓ Apply recommended config: `max_batch_delay_ms = 500`

Scanning moonshot configuration ...
`message_recency_bound_rounds = 20` is not recommended for new version.
Do you want to apply the recommended config: `message_recency_bound_rounds = 1000`? (assuming yes)
✓ Apply recommended config: `message_recency_bound_rounds = 1000`
`sync_retry_delay_ms = 2000` is not recommended for new version.
Do you want to apply the recommended config: `sync_retry_delay_ms = 1000`? (assuming yes)
✓ Apply recommended config: `sync_retry_delay_ms = 1000`
`timeout_delay_ms = 5000` is not recommended for new version.
Do you want to apply the recommended config: `timeout_delay_ms = 3500`? (assuming yes)
✓ Apply recommended config: `timeout_delay_ms = 3500`
Writing new config to /tmp/new_smr.toml
|----------------- Begin diff v7 to v9 -----------------|

..
|----------------- End diff v7 to v9 -----------------|

    ######################################################################
    # Config migrated from tests/smr_settings_v7.1.x.toml to /tmp/new_smr.toml.
    # 
    # Please review the diff above for changes made during migration.
    # 
    # Please ensure to use the new config file for target binary version.
    ######################################################################

E2e migration process from v7 to v9 example script

Note that below scripts are example, should be adapted and not directly used

  • see node_management/migrate_config_and_db_mainnet_v7_to_v9.sh is example to be used for our foundation node to migrate config and db
  • see node_management/migrate_config_v7_to_v9_docker.sh is example to be used for migrate config only for node using docker.

@so-schen so-schen requested a review from isaacdoidge May 22, 2025 08:39

### Usage Example

alias rpc-v8=~/Documents/share/repo/smr-moonshot-testnet/target/devopt/rpc_node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node operators use Docker.

Copy link
Author

@so-schen so-schen May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

en.. I am thinking operator guide will be on web page not here. We can update the example here but need to have both native and docker one

@so-schen so-schen changed the title Sc/upgrade test Sc/config migration script for v7 to v9 May 29, 2025
@so-schen so-schen changed the title Sc/config migration script for v7 to v9 v7 to v9 config migration script May 29, 2025
path = "./configs/rpc_archive"
# Whether the database should be pruned. If `true`, data that is more than `epochs_to_retain`
# old will be deleted.
enable_pruning = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in my comment on the readme, I now think we should set this to true for all databases on both nodes.

Copy link
Author

@so-schen so-schen Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our archive rpc should not set it to true, it needs to be careful when using --assume-yes
we can set default to true, and have devops be aware of when migrating archive rpc it should be kept as false,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How do you suggest that we ensure the operators activate it then? Tell them not to update the value manually in the release docs?

Copy link
Author

@so-schen so-schen Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operator should always run in prompt mode. Because operator may want to run rpc as an archive.
Also I agree it is better to set default to true if the prune function is ready now for rpc. The validator template is already set to true. The update to rpc is done in a496619

It is easier to communicate to our devops than to operator, so ok to change it default to true, while I need to ask devops to always run migration-config manually for special denoted nodes (i.e. archive)
Similarly, as enable_snapshot default to false, also need to tell devops to not using assume-yes for snapshot uploader nodes

@@ -615,6 +615,12 @@ EOF
if [ "$NETWORK" == "mainnet" ]; then
export AWS_ACCESS_KEY_ID="c64bed98a85ccd3197169bf7363ce94f"
export AWS_SECRET_ACCESS_KEY="0b7f15dbeef4ebe871ee8ce483e3fc8bab97be0da6a362b2c4d80f020cae9df7"

if is_validator; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These buckets don't exist yet, so we can't make this update just yet. Both types of nodes need to default to the mainnet-data bucket for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to put a palceholder, and then ask devops to create the bucket

Copy link
Contributor

@isaacdoidge isaacdoidge Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but this script is currently regularly downloaded by the operators, so we won't be able to merge it until the new buckets are in place.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that is fine, we can verify it work before merging

# Finished migration, and follow guide that start the node with the new image with new config and sync
# ------

# ./manage_supra_nodes.sh \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code still required?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to keep it in case we need to create a script for operator case
i.e. migrate config only, then run sync

echo "Migrate cli profile from v8 to v9"
supra-v9 profile migrate
echo "Migrate smr_settings from v7 to v9"
# TODO(SC) to be run in docker context, update path to `./configs/config.toml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still WIP?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because package of the migrate-config script into docker image is not yet ready

echo "Migrate db from v7 to v8"
rpc-v8 migrate-db configs/config.toml
echo "Migrate db from v8 to v9"
rpc-v9 migrate-db configs/config.toml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to run the v9 DB migration because it will take far too long. As with the testnet release, we'll run the migration ourselves, upload the snapshot and then ask the operators to download it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is intended to be used by us, not by operator
in my mind, operator is guided by the user-guide to run migrate-config and sync manually.

Copy link
Author

@so-schen so-schen Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the script was intended for us now, I thought we will provide user guide for operator to upgrade manually, if neded, we can create a new script for operator use case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to run this migration even on our own nodes. It takes 24+ hours. Syncing the snapshot takes maybe an hour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also don't use Docker on our machines.

Copy link
Author

@so-schen so-schen Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to run it on a pair of rpc+validator snapshot uploader nodes, right?
I remember we had a few nodes running docker image also, or now it is completely native?
The docker image is only used for running migration, then removed, not for starting the nodes.
if the docker image is less performant to run migration, we can change it to use native build

We can also change the name of the script that is for supra devops for first step only. Then we can later create a script for operator use case

# Localnet only (Optional: local env path is different from docker env path, need to be modified to use docker env path)
sed -i "" "s#${HOST_SUPRA_HOME}#configs#g" ${HOST_SUPRA_HOME}/smr_settings.toml
echo "Migrate db from v7 to v9"
supra-v9 data migrate -p configs/smr_settings.toml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do this either. We'll migrate the database of one of our validators, then upload a snapshot for the operators to sync from. I'm concerned that anyone who has synced the snapshot in mainnet will end up having to migrate the full history of the chain store due to the RPC node's version of the database not having any entries in the prune index for historical values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as explained in above about script purpose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response. Migrating will take too long if the full database has to be migrated.

Copy link
Author

@so-schen so-schen Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is the name of the script make confusing, or the script should not be put in this repo.
the script was intended used for upgrading the snapshot uploaders nodes, not for all nodes. We can add a script for operator use case, and test on a pair of rpc+validator on mainnet after snapshot uploaders are done.
after that, then we can provide the verified operator specific script to operators

@so-schen so-schen requested review from darpan-supraoracles and removed request for ninad-supraoracles June 5, 2025 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants